Generic Processes to Use S3L
============================

We will show you two generic processes to start using S3L, which includes two parts:

1. Experiment Framework
2. Call Algorithms Directly


Experiment Framework
--------------------
We provide built-in experiment process for different semi-supervised settings with different input data such as inductive/transductive, WithGraph/WithoutGraph, givenDataSplit/randomlySplit and so on. The experiment class implements the following process: ``load data``, ``data split``, ``hyper-parameters search`` and `evaluate the selected model in testing data`. In order to accelerate the experiments, we also include multi-process with ``joblib``. The experiment framework allow you to evaluate supervised/semi-supervised learning algorithms in less than ten statements. Take an example,

.. code:: python

    import sys
    import os

    from s3l.Experiments import SslExperimentsWithGraph
    from s3l.classification.LPA import LPA


    if __name__ == '__main__':
        configs = [
            ('LPA', LPA(), {
                'kernel': ['rbf'],
                'n_neighbors':[3,5,7]
            })
        ]
        
        datasets = [
            ('ionosphere', None, None, None, None)
        ]
        # (name, feature_file, label_file, split_path, graph_file)

        experiments = SslExperimentsWithGraph(n_jobs=1)

        experiments.append_configs(configs)
        experiments.append_datasets(datasets)
        experiments.set_metric(performance_metric='accuracy_score')

        results = experiments.experiments_on_datasets(
            unlabel_ratio=0.75, test_ratio=0.2, number_init=4)

        # do something with results #

The above codes evaluates ``Label Propagation`` algorithm on the built-in dataset ``ionosphere``. The best model is searched with ``rbf kernel`` and ``n_neighbors`` is in the range of [3, 5, 7]. Finally, the accuracy_score is reported in the local variable ``result``.


Call Algorithms Directly
------------------------
The built-in algorithms can be called directly as in ``sklearn`` package. The algorithms we have implemented are listed `here <https://git.nju.edu.cn/coolshan008/s3l/>`_. As long as reading the examples of certain algorithm in its module page, you can easily try out semi-supervised algorithm as you like. For example,

.. code:: python

    import sys
    import os
    import numpy as np
    from s3l.classification.TSVM import TSVM
    from s3l.metrics.performance import accuracy_score
    from s3l.datasets import base, data_manipulate


    if __name__ == '__main__':
        datasets = [
            ('house', None, None),
        ]
        for name, feature_file, label_file in datasets:
            # load dataset
            X, y = base.load_dataset(name, feature_file, label_file)

            # split
            _, _, labeled_idxs, unlabeled_idxs = \
                data_manipulate.inductive_split(X=X, y=y, test_ratio=0.,
                                initial_label_rate=1 - unlabel_ratio,
                                split_count=1, all_class=True)

            labeled_idx = labeled_idxs[0]
            unlabeled_idx = unlabeled_idxs[0]

            tsvm = TSVM()
            tsvm.fit(X, y, labeled_idx)
            pred = lead.predict(X[unlabeled_idx])
            print("Accuracy_score: {}".format(
                        accuracy_score(y[unlabeled_idx], pred)))

The above code runs ``TSVM`` (Transductive Support Vector Machine) with default hyper-parameter settings given feature ``X``, label ``y`` and indexes of labeled data``labeled_idx``. Then, the prediction is evaluated with accuracy score on unlabeled data.